Skip to content

Relax dependency constraints#1713

Merged
SFJohnson24 merged 9 commits into
cdisc-org:mainfrom
filippsatverily:filipps/update_deps_main
Jun 5, 2026
Merged

Relax dependency constraints#1713
SFJohnson24 merged 9 commits into
cdisc-org:mainfrom
filippsatverily:filipps/update_deps_main

Conversation

@filippsatverily

@filippsatverily filippsatverily commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary: expand dependency constraints and turn requirements.txt from a requirements file into a lockfile. This will allow the PyPi version of CORE to be used as a library in a larger project where other dependency constraints exist.

List of changes:

  • Define dependency constraints in pyproject.toml
    • Remove requirements-dev.txt
    • Repurpose requirements.txt as a lockfile
  • Change constraints from exact pinned versions to ranges
  • Dependency range upgrades:
    • Remove CLI defaults that clash with required to support a behavior change/fix in click 8.3.0
    • Edit DatasetXPTMetadataReader.read to support a behavior change in pyreadstat 1.2.9
    • Edit USDMDataService.__get_full_path to support a behavior change in jsonpath-ng 1.8.0
    • Remove unnecessary self capture in ContentsDefineVLMDatasetBuilder to support dask 2024.8.1
    • Remove __setitem__ reindexing in DaskDataset to fix errors surfaced in dask 2025.4.0
    • Fix test_dataset_metadata_define_dataset_builder to support a sort behavior fix in pandas 2.2.0

Tested scenarios:

@filippsatverily filippsatverily force-pushed the filipps/update_deps_main branch from 14ffff8 to a646ccf Compare April 29, 2026 01:38
@filippsatverily filippsatverily marked this pull request as draft April 29, 2026 01:41
@filippsatverily filippsatverily marked this pull request as ready for review April 30, 2026 22:57
@SFJohnson24

Copy link
Copy Markdown
Collaborator

@filippsatverily nice work, this looks good-- can you please resolve the merge conflicts so we can get this merged.

Moves dependency constraints to pyproject.toml.
Makes requirements.txt a lockfile.
Fixes an incompatibility caused by click 8.3.0, which passes the default value as-is.
Fixes an incompatibility caused by pyreadstat 1.2.9, which changed original_variable_type from 'NULL' to None
Works around an behavior change in jsonpath-ng 1.8.0 where Child.str gets wrapped in parenthesis.
Fixes tokenization errors when using dask 2024.8.1+. Starting with this
version, dask enforces that tokens remain stable across pickle
round-trips (dask/dask#11320). Capturing self in a lambda fails this
check because instance objects can have non-deterministic pickle
representations. Since calculate_variable_value_length is already a
static method, replacing self with the class name is enough to remove
the capture.
Fixes an import error caused by dask 2024.12.1, which removed the
legacy dask.dataframe.dd submodule (dask/dask#11604). Changes the
import to `import dask.dataframe as dd`, consistent with every other
file in the codebase.
Dask 2025.4.0 optimizes multiple DataFrames together, which exposes
division mismatches when assigning a pandas Series to a dask DataFrame
column. The old reset_index/set_index workaround no longer avoids this.
Replacing it with compute-assign-rewrap via dd.from_pandas, which builds
a clean expression graph. This is safe because __getitem__ already
computes the DataFrame to produce the Series being assigned.
Fixes a unit test to support pandas 2.2.0+. The pandas release fixes a
sorting bug with pandas-dev/pandas#54611. This
commit changes the expected results accordingly.

Also fixes a merge type mismatch introduced by upstream cdisc-org#1709: the
codelist metadata side was cast to StringDtype but the evaluation
dataset side was not. With pandas 2.2.0, empty columns infer as float64,
and merging float64 with string is rejected. Casting both sides to
string before the merge resolves this.
@filippsatverily filippsatverily force-pushed the filipps/update_deps_main branch from 43a93e8 to 4e9f24e Compare June 5, 2026 00:41
@filippsatverily

Copy link
Copy Markdown
Contributor Author

@filippsatverily nice work, this looks good-- can you please resolve the merge conflicts so we can get this merged.

@SFJohnson24 Please see updated PR. I added a couple small changes to accommodate the code churn over the last 2 months.

@SFJohnson24 SFJohnson24 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly updates dependencies and and resolves several other issues

Edit: I reverted this PR after merging & running our test suite with the changes. It appears to have broken several USDM as well as two rules in Dask. I am still sorting the details for both and will report back as I parse the details.

  • With Dask, this appears to be related to CORE-000334 and CORE-000355 both reporting Cannot mask with non-boolean array containing NA / NaN values in get_error_rows() which I suspect is a downstream result of the Dask indexing changes.
    pdf[key] = value.values may be the issue as .values is stripping the index from the series, making it a raw numpy array. This is then assigned positionally, giving a length mismatch causing downstream NaN indexing. I think adding
    'pdf[key] = value.reindex(pdf.index)' will resolve this. I made a small push (meant to push to my own branch that fetched your fork but accidentally pushed to your fork, apologies for this) and made another pull request from that to rerun the test suite: #1755

  • there is a slight linter issue

  • 1 unit test is failing: test_dataset_metadata_define_dataset_builder as the result needs the location sort that was added to expected.

  • can you update the docs as pip install -e .; pip install --group dev is the command for windows powershell, your command is for mac/linux; we would like both listed in the docs

  • as for the USDM issues that arose: https://github.com/cdisc-org/cdisc-rules-engine/actions/runs/27025025445?pr=1755 you can scroll to the bottom and download the differences.
    you can parse the logs-- DDF00175/CORE-00085 in the report has attributes: "", empty as well as

image I suspect the changes to __get_full_path are producing subtly different content_path strings that _traverse_path in the USDM data service

@SFJohnson24 SFJohnson24 merged commit 21f7b3a into cdisc-org:main Jun 5, 2026
SFJohnson24 added a commit that referenced this pull request Jun 5, 2026
@SFJohnson24

SFJohnson24 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

@filippsatverily i tweaked your branch on my own to get immediate feedback from our test suite. #1757
I did tweak some of your code. I tested it against the pilot study and it produces the same results as our main presently. I figured I would let you test it / look it over but I am happy to merge what I have in that test branch with those changes. pilot study results:
CORE-Report-2026-06-05T15-13-40.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants